-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sdk): Add comments to IR YAML file #8467
Conversation
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @JOCSTAA!
One high level consideration: we'll want to also parse the description when we read load a component/pipeline from YAML, since this information is not preserved in PipelineSpec
. This is so that the description isn't lost when a user authors a pipeline, compiles it, loads it, then compiles it again. We can discuss this offline.
sdk/python/kfp/compiler/compiler.py
Outdated
description = pipeline_func.description or None | ||
|
||
else: | ||
description = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible I'm missing something, but it looks like this is to handle the fact that GraphComponent
s have a .description
, but other BaseComponent
s (YamlComponent
, PythonComponent
) don't. Do you think we could put .description
on the BaseComponent
abstract base class and implement for all concrete classes, that way all component/pipeline types can be treated the same way by the compiler?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also allow us to avoid expanding the interface of write_pipeline_spec_to_file
to support comments.
if pipeline_description: | ||
comment += '# Description: ' + pipeline_description + '\n' | ||
comment += add_inputs() | ||
comment += add_outputs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Consider obtaining all the different sections in a list, then using a string joining method. I think this would slightly improve readability and reduce the likelihood that a \n
is omitted on one of the strings somewhere.
'\n'.join(sections)
The same pattern might be helpful in add_inputs
and add_outputs
} | ||
|
||
def add_inputs(): | ||
if 'inputDefinitions' in pipeline_spec['root']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: What do you think about separating the information collection from the information presentation in add_inputs
and add_outputs
for readability [ref]?
@@ -1490,5 +1490,161 @@ def pipeline_with_input(boolean: bool = False): | |||
.default_value.bool_value, True) | |||
|
|||
|
|||
class TestYamlComments(unittest.TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests look good! A couple more test cases come to mind:
- compiling a Python component
@dsl.component
- compiling a container component
@dsl.container_component
- compiling a pipeline or component with inputs that don't have default values
- compiling a pipeline with an explicit
name
argument - compiling a pipeline or component with a
float
,int
,dict
, andlist
- compiling a pipeline or component with an artifact input and output
I don't think these each necessarily must be their own test (though that's fine too -- it's more specific but also more verbose).
Bundling these characteristics together and doing full comment assertions (comparing the actual full comment against an expected full comment) could make tackling these many cases a bit more manageable and similar to real-world components we'll be handling.
/retest |
/retest |
/retest |
@@ -816,14 +816,28 @@ def load_from_component_yaml(cls, component_yaml: str) -> 'ComponentSpec': | |||
Component spec in the form of V2 ComponentSpec. | |||
""" | |||
|
|||
def extract_description(component_yaml: str) -> str: | |||
heading = '# Description: ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this redundant with extract_description_from_command
[ref]? If so, can we remove one?
I think extracting from the comment (this function) is a better approach since it works for both components and pipelines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion but I tested and confirmed that there are some cases where the description would not be in the pipeline spec yaml file but would be in the comments (for example the user defines the description using my_comp.description = 'description'
and not as a functions docstring ) and vice versa (pipeline that was compiled previously without a comment) so i think both are necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. Thanks for explaining. The support for existing IR YAML is a great point. Am I correct that this only applies for compiled components? I believe any pipeline function docstrings would be lost during compilation.
I'm not sure I've ever seen description set via my_comp.description = 'description'
. Do you have any example of this?
|
||
return comment_strings | ||
|
||
if 'root' not in pipeline_spec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any instances where this condition would be true?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came across a test case when running the python execution tests which had no 'root'
. [ref]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the link is broken. Can you provide another?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this. It's attributed to these tests: sdk/python/kfp/compiler/pipeline_spec_builder_test.py::TestWriteIrToFile
, which attempt to write an empty PipelineSpec
.
Can you update these these? It's preferable to not program around our tests in the library code.
def extract_description(component_yaml: str) -> str: | ||
heading = '# Description: ' | ||
if heading in component_yaml: | ||
description = component_yaml.splitlines()[2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll also need to be able to handle multiline docstrings
name_string = '# Name: my-pipeline' | ||
|
||
# test name is in comments | ||
self.assertTrue(name_string in yaml_content) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: What you have is perfectly valid, but perhaps use the approach of comparing the full comment string throughout (what you've done in the last test method). It's easier to reason about the correctness of the tests that compare the full comment string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do feel like having it this way for these first few tests gives more specificity as to what exactly we are searching for in the comments, but if you do have a strong opinion against it, i could always implement it that way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a strong opinion. We can leave it as is if you prefer.
/retest |
@JOCSTAA, ignore |
Once GoogleCloudPlatform/oss-test-infra#1843 is merged, tfx test should no longer be required on master branch. |
retest/ |
1 similar comment
retest/ |
retest/ |
/test kubeflow-pipelines-sdk-python39 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you again for your work on this, @JOCSTAA. The comments are mostly nitpicks related to maintainability.
# sample_input3: system.Model | ||
# sample_input4: float [Default: 3.14] | ||
# sample_input5: list [Default: [1.0, 2.0, 3.0]] | ||
# sample_input6: dict [Default: {'one': 1.0, 'three': 3.0, 'two': 2.0}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: String quotes are "
above for a default of type string but '
here in a dict
comment_sections.append('# PIPELINE DEFINITION') | ||
comment_sections.append('# Name: ' + pipeline_spec['pipelineInfo']['name']) | ||
if pipeline_description: | ||
pipeline_description = '\n# '.join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the length of this string is based on the size of the '# Description'
string. Can you implement that programmatically to make the coupling explicit?
# test comments work on compiled container components | ||
self.assertIn(predicted_comment, yaml_content) | ||
|
||
def test_comments_indempotency(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: indempotency
-> idempotency
pipeline_spec_path = os.path.join(tmpdir, 'output.yaml') | ||
compiler.Compiler().compile( | ||
pipeline_func=my_pipeline, package_path=pipeline_spec_path) | ||
comp = components.load_component_from_file(pipeline_spec_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You can move this up to the original temporary directory to avoid having to compile again.
/retest |
2786638
to
d289edd
Compare
|
||
return comment_strings | ||
|
||
if 'root' not in pipeline_spec: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this. It's attributed to these tests: sdk/python/kfp/compiler/pipeline_spec_builder_test.py::TestWriteIrToFile
, which attempt to write an empty PipelineSpec
.
Can you update these these? It's preferable to not program around our tests in the library code.
/test kubeflow-pipeline-e2e-test |
/retest |
We can ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for all your hard work on this, @JOCSTAA! This is a really great feature.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: connor-mccarthy The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@JOCSTAA: The following tests failed, say
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
* base * add tests * fix bug * adress comments * address comments 2 * sort comments * sort signatures * add indempotent test * add indempotent test2 * support multiline docstring * review * docformatter presubmit exclude * docformatter presubmit exclude * docformatter * docformatter * merge 1 * update readme * nit .items() * remove reduntant test
Description of your changes:
Implements a feature for the KFP SDK DSL compiler to include a summary of the pipeline details( including name, description and input/output signatures) as a comment to the IR YAML file
Checklist: